Gemma3 finetune后model.safetensors.index.json和原来不一样，导致无法使用vllm推理 #8243

junleiz · 2025-05-31T05:01:12Z

Reminder

I have read the above rules and searched the existing issues.

System Info

llamafactory version: 0.9.3.dev0
Platform: Linux-4.18.0-348.7.1.el8_5.x86_64-x86_64-with-glibc2.28
Python version: 3.11.0
PyTorch version: 2.6.0+cu124 (GPU)
Transformers version: 4.52.3
Datasets version: 3.6.0
Accelerate version: 1.7.0
PEFT version: 0.15.2
TRL version: 0.9.6
GPU type: NVIDIA A100-SXM4-80GB
GPU number: 8
GPU memory: 79.14GB
vLLM version: 0.8.5.post1
Git commit: 2c464f329dcd798a0b6b7aaed4719b67dec0c099
Default data directory: not detected

Reproduction

model

model_name_or_path: /storage/home/westlakeLab/zhangjunlei/models/google/gemma-3-12b-it

method

stage: sft
do_train: true
finetuning_type: full
freeze_vision_tower: true # choices: [true, false]
freeze_multi_modal_projector: true # choices: [true, false]
freeze_language_model: false # choices: [true, false]
deepspeed: examples/deepspeed/ds_z2_config.json

dataset

dataset: phone_web_0131_fix_merge_1500_wait_scroll_fix_hover
template: gemma3
cutoff_len: 8192
max_samples: 1000000000
overwrite_cache: False
preprocessing_num_workers: 256
dataset_dir: /backup/lanzhenzhongLab/junleizhang/dataset

output

output_dir: /backup/lanzhenzhongLab/junleizhang/output/gemma3_phone_web_0131_fix_merge_1500_wait_scroll_fix_hover
logging_steps: 10
save_strategy: epoch
plot_loss: true
overwrite_output_dir: true
save_total_limit: 1

train

per_device_train_batch_size: 2
gradient_accumulation_steps: 2
learning_rate: 2.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.05
bf16: true
ddp_timeout: 180000000
image_max_pixels: 1048576
report_to: wandb
mix_strategy: concat
use_fast_tokenizer: true
disable_shuffling: true

finetune之后loss正常，但是输出的模型的model.safetensors.index.json和原来模型不一样，导致报错there is no module or parameter named 'lm_head' in gemma3forconditionalgeneration

我检查了一下确实多了一个lm_head，原来模型的model.safetensors.index.json

finetune之后

Others

No response

Kuangdd01 · 2025-05-31T06:11:54Z

原因在这里，原来的gemma都是tie embedding的，hf copy了一份形成了lm_head并且存下来了
https://github.com/huggingface/transformers/blob/51d732709e5ae424e8fb6c4e58b72057a3e413c2/src/transformers/models/gemma3/modeling_gemma3.py#L806-L824

可行的解决方式:
https://github.com/vllm-project/vllm/blob/7782464a1714f6081ca06f47b75e824b14316c72/vllm/model_executor/models/gemma3_mm.py#L696-L699
在这里跳过一下lm_head这个key的load

junleiz · 2025-05-31T06:35:13Z

非常感谢您的回复

请问有可能训练的时候不保存lm_head么

对vllm不是很熟悉，如果要修改vllm，是改这一行么https://github.com/vllm-project/vllm/blob/7782464a1714f6081ca06f47b75e824b14316c72/vllm/model_executor/models/utils.py#L274

如果name是“lm_head”就跳过？

junleiz · 2025-05-31T06:39:19Z

那我直接在safetensor.index.json里删掉这个key是不是效果是一样的？

Kuangdd01 · 2025-05-31T09:35:39Z

我觉得应该不行，应该是根据现在safetensors里的key state来加载的

junleiz · 2025-06-01T06:48:25Z

请问是否可以出一个转换脚本，对vllm不太熟悉。。担心会改错，我目前尝试用

model_name_or_path: /disk2/output/gemma-3-12b-it_sft
template: gemma3
infer_backend: huggingface # choices: [huggingface, vllm, sglang]
trust_remote_code: true

CUDA_VISIBLE_DEVICES=5,,6 API_PORT=5002 llamafactory-cli api examples/inference/gemma3.yaml
来部署，但是会报错
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/data/users/zhangjunlei/anaconda3/envs/lf/lib/python3.11/site-packages/transformers/generation/utils.py", line 2597, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/data/users/zhangjunlei/anaconda3/envs/lf/lib/python3.11/site-packages/transformers/generation/utils.py", line 3560, in _sample
outputs = model_forward(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/zhangjunlei/anaconda3/envs/lf/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/data/users/zhangjunlei/anaconda3/envs/lf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/zhangjunlei/anaconda3/envs/lf/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/zhangjunlei/anaconda3/envs/lf/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in call
return self._torchdynamo_orig_callable(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/zhangjunlei/anaconda3/envs/lf/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in call
return _compile(
^^^^^^^^^
File "/data/users/zhangjunlei/anaconda3/envs/lf/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 925, in _compile
raise RecompileLimitExceeded(f"{limit_type} reached")
torch._dynamo.exc.RecompileLimitExceeded: cache_size_limit reached

Kuangdd01 · 2025-06-01T13:56:31Z

可以把transformers更新到4.52.4, 4.52.1-3 都有一些bug

junleiz · 2025-06-02T13:15:13Z

请问需要重新训练么

Kuangdd01 · 2025-06-03T04:35:42Z

请问需要重新训练么

需要

junleiz added bug Something isn't working pending This problem is yet to be addressed labels May 31, 2025

hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Jun 3, 2025

hiyouga closed this as completed Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma3 finetune后model.safetensors.index.json和原来不一样，导致无法使用vllm推理 #8243

Gemma3 finetune后model.safetensors.index.json和原来不一样，导致无法使用vllm推理 #8243

junleiz commented May 31, 2025

Kuangdd01 commented May 31, 2025 •

edited

Loading

Uh oh!

junleiz commented May 31, 2025 •

edited

Loading

Uh oh!

junleiz commented May 31, 2025

Uh oh!

Kuangdd01 commented May 31, 2025

Uh oh!

junleiz commented Jun 1, 2025

Uh oh!

Kuangdd01 commented Jun 1, 2025

Uh oh!

junleiz commented Jun 2, 2025

Uh oh!

Kuangdd01 commented Jun 3, 2025

Uh oh!

Gemma3 finetune后model.safetensors.index.json和原来不一样，导致无法使用vllm推理 #8243

Gemma3 finetune后model.safetensors.index.json和原来不一样，导致无法使用vllm推理 #8243

Comments

junleiz commented May 31, 2025

Reminder

System Info

Reproduction

model

method

dataset

output

train

Others

Kuangdd01 commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junleiz commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junleiz commented May 31, 2025

Uh oh!

Kuangdd01 commented May 31, 2025

Uh oh!

junleiz commented Jun 1, 2025

Uh oh!

Kuangdd01 commented Jun 1, 2025

Uh oh!

junleiz commented Jun 2, 2025

Uh oh!

Kuangdd01 commented Jun 3, 2025

Uh oh!

Kuangdd01 commented May 31, 2025 •

edited

Loading

junleiz commented May 31, 2025 •

edited

Loading